Metrics, Statistics, Tests

نویسنده

  • Tetsuya Sakai
چکیده

This lecture is intended to serve as an introduction to Information Retrieval (IR) effectiveness metrics and their usage in IR experiments using test collections. Evaluation metrics are important because they are inexpensive tools for monitoring technological advances. This lecture covers a wide variety of IR metrics (except for those designed for XML retrieval, as there is a separature lecture dedicated to this topic) and discusses some methods for evaluating evaluation metrics. It also briefly covers computer-based statistical significance testing. The takeaways for IR experimenters are: (1) It is important to understand the properties of IR metrics and choose or design appropriate ones for the task at hand; (2) Computer-based statistical significance tests are simple and useful, although statistical significance does not necessarily imply practical significance, and statistical insignificance does not necessarily imply practical insignificance; and (3) Several methods exist for discussing which metrics are “good,” although none of them is perfect.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Information Retrieval Metrics Based on Bootstrap Hypothesis Tests

This paper describes how the bootstrap approach to statistics can be applied to the evaluation of IR effectiveness metrics. More specifically, we describe straightforward methods for comparing the discriminative power of IR metrics based on Bootstrap Hypothesis Tests. Unlike the somewhat ad hoc Swap Method proposed by Voorhees and Buckley, our Bootstrap Sensitivity Methods estimate the overall ...

متن کامل

Towards Application-specific Evaluation Metrics

Classifier evaluation has historically been conducted by estimating predictive accuracy via cross-validation tests or similar methods. More recently, ROC analysis has been shown to be a good alternative. However, the characteristics vary greatly between problem domains and it has been shown that some evaluation metrics are more appropriate than others in certain cases. We argue that different p...

متن کامل

Justification of Various Bootstraps, Permutation Tests and Rank Tests via a New Inequality for Quantile Functions

Coupled with convergence of various empirical processes in weighted metrics, a new inequality for qf's makes quick work of many limit theorems and allows natural extensions. Research supported in part by NSF grants DMS-8801083 and Organization for Scientific Research (ZWO). subieci ctassijications. 60F05 by Netherlands

متن کامل

Using R to Simulate Permutation Distributions for Some Elementary Experimental Designs

Null distributions of permutation tests for two-sample, paired, and block designs are simulated using the R statistical programming language. For each design and type of data, permutation tests are compared with standard normal-theory and nonparametric tests. These examples (often using real data) provide for classroom discussion use of metrics that are appropriate for the data. Simple programs...

متن کامل

Non-Random Sampling and Association Tests on Realized Returns and Risk Proxies

This paper investigates how data requirements can induce a non-random selection of observations from the reference sample to which the researcher wishes to generalize test results. We illustrate the effects of non-random sampling on results of association tests in a setting with data on one variable of interest for all observations, and frequently-missing data on another variable of interest. W...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013